## Tutorial

Release

:   4.0.3

Date

:   Feb 24, 2024

This guide can help you start working with CFPQ_Data.

**You can download this tutorial as a Jupyter Notebook from the link at the end of the page.**

### Import

First you need to import the package.

In [1]:
import cfpq_data

# Load graph

After the package is imported, we can load the graphs.

## Load graph archive from Dataset

We can load the archive with the graph using function `download`.

In [2]:
bzip_path = cfpq_data.download("bzip")

[2024-02-24 15:37:15]>INFO>Found graph with name='bzip'


[2024-02-24 15:37:16]>INFO>Load archive graph_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/graphs/bzip.tar.gz')


[2024-02-24 15:37:16]>INFO>Unzip graph name='bzip' to file graph=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/graphs/bzip/bzip.csv')


[2024-02-24 15:37:16]>INFO>Remove archive graph_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/graphs/bzip.tar.gz')


# Load graph by path

We can load the graph along the specified path using function `graph_from_csv`.

In [3]:
bzip = cfpq_data.graph_from_csv(bzip_path)

[2024-02-24 15:37:16]>INFO>Load graph=<networkx.classes.multidigraph.MultiDiGraph object at 0x7f3de3882be0> from path=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/graphs/bzip/bzip.csv')


Create graph

We can also create a synthetic graph using one of the generators in module Graph generators.

# Create a one cycle graph

For example, let's create a one cycle graph, with 5 nodes, the edges of which are marked with the letter `a`.

In [4]:
cycle = cfpq_data.labeled_cycle_graph(5, label="a")

[2024-02-24 15:37:16]>INFO>Create a cycle graph=<networkx.classes.multidigraph.MultiDiGraph object at 0x7f3de38824f0> with n=5, label='a'


Change edges

We can change the specified graph labels by using function `change_edges`
from Graph utilities.

In [5]:
new_cycle = cfpq_data.change_edges(cycle, {"a": "b"})

[2024-02-24 15:37:16]>INFO>Change labels in graph=<networkx.classes.multidigraph.MultiDiGraph object at 0x7f3de38824f0> with mapping={'a': 'b'} to new_graph=<networkx.classes.multidigraph.MultiDiGraph object at 0x7f3e18b17e20>


Now the labels `a` have changed to `b`.

# Add reverse edges

In addition, we can add reverse edges to the graph by using function `add_reverse_edges`
from Graph utilities. This is extremely useful if graph analysis is formulated using such reverse edges.

In [6]:
new_cycle_with_reversed = cfpq_data.add_reverse_edges(new_cycle)

[2024-02-24 15:37:16]>INFO>Add reverse edges in graph=<networkx.classes.multidigraph.MultiDiGraph object at 0x7f3e18b17e20> with mapping=None to new_graph=<networkx.classes.multidigraph.MultiDiGraph object at 0x7f3e18b17d90>


Now, for each edge with label `a` this graph contains the reversed edge with label `a_r`.

# Load grammar

Also, we can load the grammars generated from grammar templates that are described on the Grammars page.

## Load grammars archive from Dataset

We can load the archive with the grammars for the specified template using function `download_grammars`.

In [7]:
c_alias_path = cfpq_data.download_grammars("c_alias")

[2024-02-24 15:37:16]>INFO>Found grammar template='c_alias'


[2024-02-24 15:37:16]>INFO>Load archive grammar_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/grammars/c_alias.tar.gz')


[2024-02-24 15:37:16]>INFO>Unzip grammars with template='c_alias' for graph with graph_name=None to directory grammars=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/grammars/c_alias')


[2024-02-24 15:37:16]>INFO>Remove archive grammar_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/grammars/c_alias.tar.gz')


# Load grammars archive for specified graph

For some grammar templates we also can load the archive with the grammars for specific graphs.

In [8]:
java_pt_avrora_path = cfpq_data.download_grammars("java_points_to", graph_name="avrora")

[2024-02-24 15:37:16]>INFO>Found graph with graph_name='avrora' and grammar template='java_points_to'


[2024-02-24 15:37:17]>INFO>Load archive grammar_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/grammars/java_points_to_avrora.tar.gz')


[2024-02-24 15:37:17]>INFO>Unzip grammars with template='java_points_to' for graph with graph_name='avrora' to directory grammars=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/grammars/java_points_to_avrora')


[2024-02-24 15:37:17]>INFO>Remove archive grammar_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/grammars/java_points_to_avrora.tar.gz')


Regular grammars

Currently, we have one representation of regular grammars:

1. [Regular expression](https://en.wikipedia.org/wiki/Regular_expression#Formal_definition)

# Create a regular expression

For example, a regular expression can be created by using function `regex_from_text`
from Reading and writing grammars.

In [9]:
regex = cfpq_data.regex_from_text("a (bc|d*)")

[2024-02-24 15:37:17]>INFO>Create regex=(a.(bc|(d)*)) from text='a (bc|d*)'


# Load regular expression by path

We can load the regular expression along the specified path using function `regex_from_txt`.

In [10]:
path = cfpq_data.regex_to_txt(regex, "test.txt")
regex_by_path = cfpq_data.regex_from_txt(path)

[2024-02-24 15:37:17]>INFO>Turn regex=(a.(bc|(d)*)) into text='(a (bc|(d)*))'


[2024-02-24 15:37:17]>INFO>Save regex=(a.(bc|(d)*)) to dest=PosixPath('/home/runner/work/CFPQ_Data/CFPQ_Data/docs/test.txt')


[2024-02-24 15:37:17]>INFO>Create regex=(a.(bc|(d)*)) from text='(a (bc|(d)*))'


[2024-02-24 15:37:17]>INFO>Create regex=(a.(bc|(d)*)) from path=PosixPath('/home/runner/work/CFPQ_Data/CFPQ_Data/docs/test.txt')


Ð¡ontext-free grammars

Currently, we have three representations of context-free grammars (CFGs):

1. [Classic](https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions)

1. [Chomsky Normal Form](https://en.wikipedia.org/wiki/Chomsky_normal_form)

1. [Recursive State Machine](https://link.springer.com/chapter/10.1007/978-3-030-54832-2_6#Sec2)

# Create a classic context-free grammar

A classic context-free grammar can be created by using function `cfg_from_text`
from Reading and writing grammars.

In [11]:
cfg = cfpq_data.cfg_from_text("S -> a S b S | a b")

[2024-02-24 15:37:17]>INFO>Create cfg=<pyformlang.cfg.cfg.CFG object at 0x7f3de389f370> from text='S -> a S b S | a b', start_symbol=Variable(S)


# Load context-free grammar by path

We can load the classic context-free grammar along the specified path using function `cfg_from_txt`.

In [12]:
path = cfpq_data.cfg_to_txt(cfg, "test.txt")
cfg_by_path = cfpq_data.cfg_from_txt(path)

[2024-02-24 15:37:17]>INFO>Turn cfg=<pyformlang.cfg.cfg.CFG object at 0x7f3de389f370> into text='S -> a S b S\nS -> a b'


[2024-02-24 15:37:17]>INFO>Save cfg=<pyformlang.cfg.cfg.CFG object at 0x7f3de389f370> to dest=PosixPath('/home/runner/work/CFPQ_Data/CFPQ_Data/docs/test.txt')


[2024-02-24 15:37:17]>INFO>Create cfg=<pyformlang.cfg.cfg.CFG object at 0x7f3de3882bb0> from text='S -> a S b S\nS -> a b', start_symbol=Variable(S)


[2024-02-24 15:37:17]>INFO>Create cfg=<pyformlang.cfg.cfg.CFG object at 0x7f3de3882bb0> from path=PosixPath('/home/runner/work/CFPQ_Data/CFPQ_Data/docs/test.txt'), start_symbol=Variable(S)


Generate grammar

We can also generate a grammar for specified template using one of the generators in module Grammar generators.

# Generate a Dyck grammar

For example, let's generate a Dyck grammar of the balanced strings with `a` as an opening parenthesis, `b` as a closing parenthesis, and without the empty string.

In [13]:
dyck_cfg = cfpq_data.dyck_grammar([("a", "b")], eps=False)

[2024-02-24 15:37:17]>INFO>Create a Dyck cfg=<pyformlang.cfg.cfg.CFG object at 0x7f3de37ad3a0> with types=[('a', 'b')], eps=False


# Generate a Java Points-to grammar

Also, let's generate a Java Points-to grammar for the field-sensitive analysis of Java programs with field names `f0` and `f1`.

In [14]:
java_pt_cfg = cfpq_data.java_points_to_grammar(["f0", "f1"])

[2024-02-24 15:37:17]>INFO>Create a Java Points-to cfg=<pyformlang.cfg.cfg.CFG object at 0x7f3de37ade80> with fields=['f0', 'f1']


Benchmarks

In addition, one of the prepared benchmarks that contains graphs, queries, other input data, and results for
a particular formal-language-constrained path querying problem can be downloaded.

Currently, we provide the following benchmarks documented on the Benchmarks page:

1. MS_Reachability

# Load benchmark archive

You can load the archive with the benchmark using function `download_benchmark`.

In [15]:
ms_reachability_path = cfpq_data.download_benchmark("MS_Reachability")

[2024-02-24 15:37:17]>INFO>Found benchmark with name='MS_Reachability'


[2024-02-24 15:37:19]>INFO>Load archive benchmark_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/benchmarks/MS_Reachability.tar.gz')


[2024-02-24 15:37:19]>INFO>Unzip benchmark name='MS_Reachability' to directory benchmark=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/benchmarks/MS_Reachability')


[2024-02-24 15:37:19]>INFO>Remove archive benchmark_archive=PosixPath('/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/cfpq_data/data/benchmarks/MS_Reachability.tar.gz')


# MS_Reachability benchmark

MS_Reachability benchmark can be used for the experimental study of the algorithms that solve the multiple-source
formal-language-constrained reachability problem. This benchmark is described on the MS_Reachability page.

For this benchmark we provide some useful functions from
Graph utilities.
For example, the set of source vertices can be saved to the TXT file or it can be loaded from benchmark by using
functions `multiple_source_from_txt` and
`multiple_source_to_txt`.

In [16]:
s = {1, 2, 5, 10}
path = cfpq_data.multiple_source_to_txt(s, "test.txt")
source_vertices = cfpq_data.multiple_source_from_txt(path)

[2024-02-24 15:37:19]>INFO>Save source_vertices={1, 2, 10, 5} to dest=PosixPath('/home/runner/work/CFPQ_Data/CFPQ_Data/docs/test.txt')


[2024-02-24 15:37:19]>INFO>Load source_vertices={1, 2, 10, 5} from path=PosixPath('/home/runner/work/CFPQ_Data/CFPQ_Data/docs/test.txt')
