rsmsn_blog/parsing_files_with_python.md at b55aa0326007d9a00b0c47708942e7d5347ea3ff

Files

Norm Rasmussen 6278b1214c Neovim Search and Replace commands moved to final stages, no longer draft. Added some shell-ideas for later writing.

2023-11-15 21:58:41 -05:00

1.9 KiB

Raw Blame History

title, date, tags, author, showToc, TocOpen, draft, hidemeta, description, disableHLJS, disableShare, disableHLJS, hideSummary, searchHidden, ShowReadingTime, ShowBreadCrumbs, ShowPostNavLinks, ShowWordCount, ShowRssButtonInSectionTermList, UseHugoToc, cover

title

date

Full Script

import csv
import pandas as pd
import re

LISTTUPLE = []
LINELIST = []
COUNT = 0
DOMAIN_DICT = {}
df = pd.DataFrame()

with open('./Workflows_js_nodes.js', 'r') as file:
    for num, line in enumerate(file, 1):
        if "<<<" in line:
            LINELIST.append(num)
        if ">>>" in line:
            LINELIST.append(num)
LINELIST = sorted(LINELIST)
# print(LINELIST)
x = len(LINELIST)

try:
    while COUNT in range(x):
        COUNT += 1
        temp_tupe = (LINELIST[0], LINELIST[1])
        LISTTUPLE.append(temp_tupe)
        LINELIST = LINELIST[2:]
        # LINELIST.pop(1)
except IndexError as e:
    pass

for pagetuple in LISTTUPLE:
    res_list = []
    domain_line = int(pagetuple[0]-2)
    seg_start = int(pagetuple[0]-1)
    seg_end = int(pagetuple[1]-1)
    with open('./Workflows_js_nodes.js', 'r') as file:
        lines = file.readlines()
        title = lines[domain_line][4:-1]
        segment = lines[seg_start:seg_end]
        for line in segment:
            result = re.search(r"(?:'@[a-z|.]+.[a-z]{3})", line)
            if result:
                res = result.group()[1:]
                res_list.append(res)
        DOMAIN_DICT[title] = res_list
df = df.from_dict(DOMAIN_DICT, orient='index')
df.to_csv('~/export_file.csv')

1.9 KiB Raw Blame History

Full Script

1.9 KiB

Raw Blame History